Improving precision in concept normalization.
نویسندگان
چکیده
Most natural language processing applications exhibit a trade-off between precision and recall. In some use cases for natural language processing, there are reasons to prefer to tilt that trade-off toward high precision. Relying on the Zipfian distribution of false positive results, we describe a strategy for increasing precision, using a variety of both pre-processing and post-processing methods. They draw on both knowledge-based and frequentist approaches to modeling language. Based on an existing high-performance biomedical concept recognition pipeline and a previously published manually annotated corpus, we apply this hybrid rationalist/empiricist strategy to concept normalization for eight different ontologies. Which approaches did and did not improve precision varied widely between the ontologies.
منابع مشابه
Improving Search and Retrieval Performance through Shortening Documents, Detecting Garbage, and Throwing Out Jargon
This thesis describes the development of a new search and retrieval system used to index and process queries for several different data sets of documents. This thesis also describes my work with the TREC Legal data set, in particular, the new algorithms I designed to improve recall and precision rates in the legal domain. I have applied novel normalization techniques that are designed to slight...
متن کاملImproving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion
The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditio...
متن کاملImproving Term Frequency Normalization for Multi-topical Documents and Application to Language Modeling Approaches
Term frequency normalization is a serious issue since lengths of documents are various. Generally, documents become long due to two different reasons verbosity and multi-topicality. First, verbosity means that the same topic is repeatedly mentioned by terms related to the topic, so that term frequency is more increased than the well-summarized one. Second, multi-topicality indicates that a docu...
متن کاملIntegrated cTAKES for Concept Mention Detection and Normalization
We participated Task 1 using an existing system MedTagger implemented in integrated cTAKES (icTAKES). The concept mention detection is based on Conditional Random Fields (CRF) and the concept mention normalization is based on a greedy dictionary lookup algorithm. A distinctive feature in MedTagger compared to other concept mention detection systems is the incorporation of dictionary lookup resu...
متن کاملA survey on the comparison between precision and traditional agriculture by budgeting method
The present study was conducted to compare precision and traditional agriculture by budgeting technique. Its statistical population consists of 210 experts in agricultural jihad organization of Qom province. The validity of Questionnaire as research tool ware confirmed by professors while its reliability was corroborated by Cranach’s alpha to 0.78-0.94 intervals. According to the findings, ther...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره 23 شماره
صفحات -
تاریخ انتشار 2018